UNCLASSIFIED 


Rep^iaduced 

Ute 


ARMED  SERVICES  TECHNICAL  INFORMATION  AGENCi’ 
ARLINGTON  HALL  STATHW 
ARLINGTON  12,  VIRGINIA 


UNCLASSIFIED 


NOTICE:  When  government  or  other  drRwings,  speci¬ 
fications  or  other  data  are  used  for  any  pujcTpose 
other  than  in  connection  vlth  a  definitely  related 
government  procurement  operation,  the  U.  S. 
Government  thereby  incurs  no  responsibility,  nor  any 
obligation  whatsoever;  and  the  fact  that  the  Govern¬ 
ment  may  have  formulated,  furnished,  or  in  any  way 
supplied  the  sai'’  drawings,  specifications,  or  other 
data  is  not  to  be  regarded  by  implication  or  other¬ 
wise  as  in  any  manner  licensing  the  holder  or  any 
other  person  or  coiTporation,  or  conveying  any  ri^ts 
or  permission  to  manufacture,  use  or  sell  any 
patented  invention  that  may  in  any  way  be  related 
thereto. 


^  MEMORANDUM 
RM-3329-PR 

OCTOBER  1062 


I/O 


i 

%  '*•>  T 

fc^c:si 

'.trCCO 


j 


i 

I 

I 


!  THE  METHOD  OP  LEAST  SQUARES 
I  AND  OPTIMAL  FILTERING  THEORY 


Y.  C.  Ho 


PREPARED  FOR: 

UNITED  STATES  AIR  FORCE  PROJECT  RAND 


7^ 


R-fl  n 


SANTA  MONICA  •  CAitfOtNIA 


MEMORANDUM 

RM-3329-PR 

OCTOBEl-  962 


THE  METHOD  OF  LEAST  SQUARES 
AND  OPTIMAL  FILTERING  THEORY 

Y.  C.  Ho 


This  research  is  sponsored  by  the  United  Slates  Air  Force  under  Project  RAND  —  Con¬ 
tract  No.  AF  49(6.tR)-700- monitored  by  the  Director.'te  of  Development  Planning, 
Deputy  Chief  of  Staff,  Research  and  Technology,  USAF.  Views  or  conclusions  con¬ 
tained  in  this  Memorandum  shoi.ld  nut  be  interpreted  as  representing  the  ufficial  opinion 
or  policy  of  the  United  Stales  Air  Force.  Permission  tv  uuote  from  or  reproduce  purtioiM 
of  this  Memorandum  must  be  obtained  from  The  RAND  Corporation. 


ill 


PREFACE 

This  Memorandum  reports  on  pcu^t  of  RAED's  contlir>:lng  vork  on 
Guidance  and  Orhit  Mechanics.  It  Is  a  theoretical  diccusslon  of 
methods  of  estimating  unknown  parameters  -:j  the  prer^nce  of  noise, 
and  can  be  applied  ‘:o  problems  of  estimating  or  predicting  space 
ti-ajectories  or  orbits.  It  should  be  of  interest  to  mathematical 
statisticians,  canaainications  engineers,  and  those  concerned  vl'^h 
orbit  mechanics. 
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SUMHART 

Thl«i  Kemorandua  demonstrates  the  corres)>ondenee  between  the 
well>lmown  method  of  least  sq.uareB  and  the  :more  recent  oiptimal 
filtering  theory  of  Kalman.  It  shows  that  via  a  sl]ig;>le  lama  on 
matrix  inversion,  most  of  the  re-^ults  In  linear  filtering  and 
prediction  theory  can  be  easily  derived.  Ihe  connection  of  the 
least  square  method  with  the  so-called  Duality  Principle  in  optiaal 
conti’o''  theory  Is  also  discussed.  Ihls  connection  places  In  evidence 
the  mathematical  similerltlea  between  prohlens  of  control  and  problems 
of  prediction.  Ihe  Memorandum  concludes  with  a  proposed  application 
for  orbit  determination  of  a  24>hour  satellite  using  the  techniques 
described.  Ibis  application  is  concerned  with  computing  corrections 
to  the  satellite's  orbital  parameters  based  on  noisy  observations 
of  azimuth  and  elevation  angles  by  improving  an  Initial  orbital 
parc.5et*  r  istlmation  through  additional  observations.  The  orbital 
parameter  corrections  may  then  be  used  as  the  input  to  an  orbit 
trunsfer  process  or  to  ruflixe  a  preliminary  orbit. 
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A  HOTB  on  SYMBOLS 


The  foUeving  conventions  for  notation  are  adopted  here; 
all  upper  :ase  Roman  or  Greek  letters  are  matrices;  lover  case  Roman 
letters,  ve'-tors.  Scalars  are  denoted  by  lover  case  Greek  letters. 
Subscripts  are  used  to  denote  cooponents  of  matrices  or  vectors  with 
the  understanilng  that  these  conponents  themselves  may  be  matrices 
or  vectors.  Often,  vhen  it  is  clear  in  the  context,  a  subscript 
(c.g.,  k)  will  be  used  to  denote  the  value  of  the  matrix  or  vector 

,  For  exa]q>le,  means  the  value  of  the 
andx^  15  its  5th  coe^ponent.  The  transpose  of 
a  matrix  or  vector  is  indicated  by  prime  ' .  The  quadratic  form  x'Ax 
is  often  written  as  ||x  ||^. 


vector  X  at  time  tj^. 


at  the  Instant  of  time,  t^^ 
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I.  IHTRODUCTIOiy 

The  least  bnuares  method  has  been  Investigated  and  put  to  use 
over  a  long  period  of  time.  A  historical  survey  of  the  suboOet  going 
as  far  back  as  Gauss  is  neither  appropriate  nor  possible  here. 

Interested  readers  are  referred  to  Ref.  1  for  a  more  coo^lete  treat¬ 
ment.  Recently,  due  to  the  increasing  emphasis  on  research  in  space 
science  such  as  trajectory  determination  and  optimization,  stochastic 
control  theory,  etc.,  there  has  been  considerable  renewed  Interest 
in  the  theory  of  least  squares  estimation  and  its  applications* 

Notably,  there  have  been  two  approaches  to  the  subject.  As  one  ap¬ 
proach,  we  can  treat  the  problem  as  one  of  stochastic  aptimizatlon. 
?ollo':l.jg  Wiener,  but  using  the  more  powerful  state-space  techniques, 
we  compute  the  conditional  probability  dlstrlbutioi:  function  of  the 

unknown  variables.  From  these  distributions,  the  least  sqpare  estimate 
(2  3) 

can  be  computed.'  *  '  The  problem  can  also  be  approached  from  the 

statistical  estimation  point  of  view.  We  calculate  the  likeli¬ 

hood  estimates  of  the  unknown  variables  and  show  that,  in  the  case 

of  gausu*an  noise,  the  estlntes  are  optimal  in  the  least  squares 

(4)  * 

sense.'  While  it  is  known  that  these  two  approaches  must  yield 
similar  results,  so  far  as  is  known,  the  exact  correspondence  and  its 
various  implications  he.ve  not  been  stated.  It  is  the  purpose  of  this 
Ms!»orandum  to: 

o  demonstrate  this  correspondnece  throu^  a  very  elementary 
derivatiur.  and  a  remarkable  theorem  on  matrix  inversion 

o  reconcile  all  known  and  useful  results  obtained  from  the  two 
different  approaches 

Also,  Bryuon,  A.  E.  and  M.  Frazer,  private  cosmiunication,  April  1962 
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o  DJsciiss  the  isqpllcatlons  of  these  results  and  their  ap¬ 
plications  to  problems  of  orbit  determination  for  satellites. 


[.  A  DEIERMnilSTIC  ISRIVATIOR 


fPi£>IIPEPENDan’  CA>g 

We  shall  i-he  'liscusslon  with  a  viry  biiaple  problen. 

Once  the  deri’/atlon  has  been  carried  tlircu^  for  this  case,  gener¬ 
alization  to  nore  Involved  and  practically  more  useful  cases  is 
strai^tforward.  Let  us  consider  the  set  if  equations. 


'ir  ~  “K 

where  x  is  an  unknown  n-vector,  is  a  given  an  aatrlx  where 
a  >  n  and  is  a  given  m-vector.  Here  coaponents  of  usually 
play  the  role  of  observed  data  which  are  related  to  x,  the  unkncnm 
but  desired  parameters,  through  the  theoretical  relationship  of  Eq. 
(!)•  In  general,  of  course,  Eq.  (l)  does  not  possess  a  solution 
due' to  inevitable  measurement  errors,  etc.  It  is  thus  reasonable 
to  pose  the  problem  of  determining  *  *  ^  such  that  the  error 

~  ^  ~  ^  minimized  in  some  sense.  In  particular  we  consider 

the  quadratic  fora 

^  iv '  '  <v '  *ic>’  -  *K>  (2) 

where  is  given  mxm,  positive  definite  (i.e.  1^  also  exists) 

» 

syanetric  matrix. 


•  „1  _ 

The  iatcrprctalior.  cf  will  be  given  In  Section  TH.  Clearly 

*  Identity  matrix  =  I,  then  J  is  sisply  the  sum  of  the  squared 
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The  physlcHl  origin  of  the  above  prohlem  may  be  visualized  as 
follows:  At  various  time  Instanta  t^,  we  make  correspowiing 

sets  of  measurements  z^^,  z^,  ...  Zj^.  Each  set  of  these  measurements 
is  theoretically  related  to  a  set  of  unknown  parameters  via  the 
relationship 


H^x  = 


A  collection  of  these  relationships  for  each  set  of  measurements 
then  yields  Eq.  (l)  where  denotes  the  matrix  formed  by  the  sub¬ 
matrices  and  Zj^,  the  vector  formed  by  z^^,  ...,  Zj^. 

Now,  upon  setting  grad  J  =  0,  we  obtain  the  well-known  result  that 

*  ■  \  ■  <”k  \  (3) 


where  it  is  assumed  that  the  columns  of  ere  linearly  independent 
(i.e.,  RjJ  I^)  exists).  Now  suppose  some  additional  observations 
have  been  made,  as  represented  by  an  r-vector  then  ask  what 

change  should  be  made  to  Xj^  such  that  the  new  ^ 

is  again  optimal  with  respect  to  the  entire  set  of  data 
(zj^,  Equation  (l)  now  has  the  form 


h  *  ^\+i] 

y__ 

\+l 

\+i 

('0 


* 

If  this  is  not  satisfied  in  practice  it  usually  implies  that 
on  observation  scheme  has  some  fundamental  defects.  Again  we  shall 
postpone  a  discussion  on  this  point  to  later  sections. 


P.nd  It  Is  filfiPT  thar  J  aiu^uld 


11V\ "  ^^+1^  “  ^  ll^-i ^  IW^i^K  +  “  \*i\\ti 

> 

■’  ■  IK«  'I '  -1  „  "  -  vj'-i  (5) 

^  \  \  h*l 


since  Hj^  Xj,  -  Cj^  =  0.*  Ibe  first  term  in  Eq,  (5)  has 

the  obvious  interpretation  of  being  the  cost  of  deviating  from  the 
previously  determined  optimal  for  the  set  of  data  Zj^.  Now  setting 


=  0  we  obtain 


^\+l  ^\+l  “  ®k+l  ^+1  ®lt+l^  H.*x  ^.+1 


^+1 


Equation  (6)  suggests  a  successive  Improvement  scheme  for  deterTnining 
the  urJcnown  x  as  more  and  more  observations  are  made.  If  ve  define 

(“i  I?'  %'■’  ■  ■'k  ’  “k-  *  “{.1  ‘ii  •  ’’k.i 


■’i:.\-"'k"  ‘  “k.i 


**k+l  “  ^k+1  "^k+l  ^+1  '^k+1  ■  ^+1  *k^ 


Equation  (5)  is  actually  an  Implicit  statement  of  the  Principle 
of  Optimality  and  can  lead  to  a  dynamic  programming  solution  of  the 
problem,  as  was  first  pointed  out  by  Bellman  (Ref.  5). 
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are  the  desired  recursiou  fomiulas.  Tliis  is  the  stp.gewise  estimation 

(>i) 

scheme  suggested  by  Swerling'  '  and  more  recently  but  independently 
)(> 

by  Erycon  and  Fraser,  However,  the  con!pulati''-.’ai  scheme  of  Eqs, 

(7)  and  (8)  has  one  drawback—the  inversion  of  the  nxn  matrix 
must  be  carried  out  at  every  stage.  This  is  a  straightforward  but 
nevertheless  somewhat  unpleasant  task.  However,  this  difficulty  can 
be  circumvented  by  the  fo3J.cwlng  remarkable  Matrix  Inversion  Lemma 
which  is  well  known  in  nuiierical  analysis  but  not  in  control  engineer¬ 
ing. 

Lemma;  If  h  Hj^^^  where  P,  R  are  symmetric  and 

nonsingular,  and  R  >  0 

then  exists  and  is  given  by 

^.1  K*1  (”*.1  \  “k..l 

Proof!  by  direct  substitution  we  have  (dropping  unnecessary  subscripts) 

"■m  “  I  •  H  -  H-  R-1  H  P^  n-  (a  Pj  H- 

+  R)"^  H  -  H'  Pj^  H»  +  R)"^  H  Pj^ 

■'■jap^.i  Q.E.S. 

How  Eq.  (9)  can  take  the  place  of  Eq.  (7)  with  an  luroortant  difference. 
Tlie  iiatrix  +  \+l^  dimension  rxr  where  r  is  the 

number  of  new  observations.  Since  we  are  at  liberty  to  process  new 
data,  one  or  a  group  at  a  time,  r  can  be  simply  taken  as  1.  Ihus, 

Lhe  matrix  inversion  problem  has  been  eliminated  in  the  stagewise 
correction  scheme. 

^Bryson,  A.  E.  and  M.  ITazcr,  private  commurlcation,  April  1902. 


=  I  + 


H*  R"^  - 


-1 


R  (H  P.  H«  +  R)  (H  P.  H*  +  R) 
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TDIE-DEPENDEirr  CASE 

So  far,  we  have  treated  the  problem  assuming  that  the  elements 
of  the  ratrix  and  are  given.  It  is  usual  to  postulate  (with 
good  Justification  in  many  cases)  that  the  vector  x  ia  the  state 
of  a  force-free  linear  dynamic  system  and  that  it  changes  with  time 
according  to  some  deterministic  and  known  relationship. 

Vi  =  \  \ 

Thus,  considering  the  dynamic  system  as  shown  in  Pig.  1  we  have  at 
every  instant  the  set  of  theoretical  relationships 


V|^  (noise) 


k+1 


One  period 
delay 


Fig.  I  —  Dynamic  system  representation  of  observations 


Hi  x^  = 


\+i  ^.+1  “  ®k+l 


b 


Since  the  equations  for  the  dvnamic  system  are  knovn,  knowledsc 
of  the  state  of  the  system  at  any  one  time  Implies  knowledge  of 


the  state  at  any  other  time.  Now  suppose  that  re  already  have  an 
estimate  ^  based  on  ,  Zj^  and  wish  to  obta  In  an  estimate 

by  incorporating  the  new  data  above  equations  can 

then  be  rewritten  as 


where  we  have  utilized  the  properties  of  the  transition  matrix 

t^)  -  tj.)  •(t^,  t;^) 

ar^  ty)  *  i  ^0*“  0 
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J  -  IK+i  \ 


Ax^^l)  .  Zj, 


TTie  second  term  in  J  measures  the  cost  ox  deviating  from  the  previoxis 
optiioum  by  the  amount  for  the  data  set  z^,  A  change 

at  tj^^^  is  equivalent  to  ^®nce  the  inclusion  of  the 

factor  in  the  expression.  Minlr.izlng  J,  vr.  get 


Klx  <Vi  -  »k,i « “i.)  (“) 

I 

(“) 


A  simple  generalization  of  the  matrix  inversion  lemma  yields  (again  drop- 
dropping  the  obvious  subscripts  for  simplicity) 

=  ♦  Fj^  ♦  Pj^  1I«  (H4>  Pj^^  4>'  H*  +  n)’^  H*  Pj^  ♦* 

or 

Vi/kn "  ’’k+i/k  -  ''K.i/k  ">  W »’  ‘  ">■" « W  (^3) 


=  ♦  Pj^  o'  as  the  slmpJc  extrapoiation  of 
Pj^  to  >rtille  value  of  vhen  the  r 

additional  measurements  are  incorporated. 

Furthermore,  Eq.  (il)  can  he  rcvritteix  as 

4.1  •  \.i/k  “•  (“  Vi/k  «■■  *  ('k.i  - » •  4  <!’'> 


Here  we  can  think  of  P, 


k+l/k 
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ty  showini?  that 

»'  (»  Via  "  ViA« 

Equations  (13)  and  (l4)  axe  computationally  slightly  more  convenient 
that  Eqo.  (ll)  and  (l3). 

It  will  now  be  shown  that  the  purely  deterministic  derivation  given 
above  has  a  natural  probabil . 'tic  interpretation  which  easily  connects 
and  reconciles  the  various  currently  known  results  in  estimation. 


III.  PROBABILISTIC  INTERFBETATION 


To  get  a  protabllistic  interpretation  of  our  results,  ve  pro¬ 
ceed  to-  inaKe  one  additional  assuniption.  'rfe  shall  assiune  thf-  dis¬ 
crepancies  in  Eq.  arc  caused  by  measurement  noises  and  that  the 
meae’irement  noise  can  be  adcjquately  described  by  the  Joint  gaus- 
sian  probability  distribution  function 

Vj,i>  • . . ,  Vj,^)  =  — 

(2jf)  det  (r"-^) 


572 


"k 


(15) 


with  zero  mean  and  covariance  matrix  and  that  every  set  of  r 
measurements  ere  statistically  independent.  Thus,  the  Joint  pro¬ 
bability  density  of  the  set  of  measurement  Zg,  ...,  Zj^  is  simply 
a  product  of  k  expressions  similar  to  Sq.  (15).  Now  if  we  make  the 
change  of  variable  for  the  problem  in  Section  II, 


\  "  V 


(16) 


and  noting  tlmt  the  Jacobian  of  the  transformation  is  1,  we  obtain 
the  Joint  probability  density  function  for as, 

^  V  - TiTs  ['  i  ^ 

(2rt)  (det  R.'^) 

-  (17) 

where  RjJ^  is  a  block  diagonal  consisting  of  R"^,  . . . ,  R^^^  and  m  is 
the  dimension  of  7^.  I-:  order  to  maximize  the  likelihood  (hence 


1? 


the  symbol  L  I'cv  Eq.  (IT))  that  the  given  7.^  are  produced  by  u  certe.ln 

X  we  choose  x  so  as  to  minimise  the  exponent  In  Sq,  (l7)»  i.c., 

2 

minimize  J  =|jZj,,  -  which,  of  course,  leeuls  to  the  solution 

Xj^  =  (H^  P“^  H^)"^  Rj^l  Zj^.  But  by  Eq.  (l6)  we  get 

.  V  <  (H,.  p-l  H^)-l  ..  (18) 

4 

Which  l=ads  to  the  expvessior  that 

E(^)  =  X  (19) 

00.  ,^)  .  s[(x  -%)(.-  S^)'].  (Bf  V''  - 

by  direct  calculation.  When  r  more  measurements  are  Incorporated,  we 
have  in  addition  to  Eq.  (I6) 

^+1  *  *  Vl  *  \+l 

Now,  because  of  the  irilepeulence  of  the  observed  errors  (Vj^, 
the  liXellhood  function  for  the  data  (zj^,  uhknoira 

parameters  j;  ■=  Xj^  +  (of  Eq.  (?))  is 

^\+i>  - 7jr-^ — 

(2n)  (det  (P";^)) 

*  ‘vi»'  <Vi  - 

•  m/?.  ,  1/p 

(2s)  (det  (pJJ  )) 

<22)* 

'* 

Since  measures  the  covariance  of  the  error  of  the  estimate  x. 
(Eq.  {19)) t  ^ue  second  half  of  the  term  is  the  likelihood  of  the  errOT 
Ax  of  xit.  For  exaa^le  if  is  diagonal  and  P^i  is  very  small  (is^lylng 
considerable  coni  Idence  in  the  estimate  xn) ,  then  a  larger  pewlty  is 
iagposed  on  guessing  Axj^  awigr  from  tero.  See  also  the  detenainistic 
interpretation  In  Sectlcxt  II. 
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iJaxiiniz  .1  ug  E>1.  (22)  with  respect  to  is  equivalent  to  minimizing 

the  sum  of  the  exponents.  He  are  then  led  to  the  same  e:q?ression 
for  (Eq*  (8))*  The  new  estiaate  ~  \  ^\+i  various 

maaipulutlons  usini'  £qs.  (9)  and  (lo)  yields 

\+l  *  ^  \+l  ^  ^  ''k  ^43  ^+1  ^^k+l  \+l 

Thus  we  have 

E(^,j)  -  «  (2k) 

(\.l*  -*[(*-  *k.l>  (*  -  \.1>’] 

■  ■■k.i  (=i*i  Kh  ”k.i  ♦  V  ’’k.i 
-  fk«  t=5) 

Consequently,  we  arrive  at  the  following  well-known  results  that  in 
the  case  of  least  square  estimates  with  gaussian  noise  we  have: 

1.  The  least  squares  estimate  x  as  given  by  Eq.  (8)  and 
calculated  via  Eq.  (9)  is  a  gaxissian  random  vector  (since  frem  h 
(23)  we  know  it  is  merely  a  linear  combination  of  other  ’aussian 
variables)  with  mean,  x,  and  covariance  of  the  error  of  eSvi  atlon 

^k+l‘ 

2.  ’Hk  least  squares  estimate  is  unbiased  and  efficient  (i.e., 
it  has  minimum  variance  which  is  merely  the  statistical  way  of  saying 
"least  sq-iareo"). 

3.  The  conditional  probability  of  x  given  and  Zj^^^  is  a 

gaussian  random  vector  nrul  has  mean  x  and  covariance  (this 


is  evident  from  Eq.  (23)).  Tlius,  the  conditional  mean  of  x  is 
the  least  squares  estimate. 


Nothing  is  changed  if  we  consider  the  '.ime-dependent  case. 

All  the  formulas  in  Section  II  apply  without  modification.  At  any 
time  instant  tj^  we  itave, 

1.  A  current  best  estimate  Xj^  which  is  also  the  tonditional 

mean  of  Xj^  given  Zj^. 

2.  The  covariance  matrix  of  the  error,  ^k/k* 

3.  The  extrapolated  error  covariance  matrix  of  “ 

-  \)  is  «•. 

When  a  new  sot  of  measurements  z.  io  made  we  have 

iw+i 

U.  The  updated  estimate  ^  (Eq.  (U)  or  (l*»)) 

5.  The  updated  error  covariance  matrix 

(13). 

Furthermore,  a  simple  generalization  is  possible.  Suppose  we  con¬ 
sider  the  case  as  shown  in  Fig.  2. 


V|^  (noi$«) 


Fig.  2  — Dynamic  system  representation  of  observations 
with  input  disturbonce 
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Where  the  dynamic  syatem  io  cubjccl  to  some  random  disturbances, 
let 


-  0 

E(w^*  Wj)  =  Q'(k  -  J) 

E(wj^‘  v^)  =  0  for  all  k,  j 


i.c.,  w  is  a  white  gauss ian  random  process  independent  with  v.  It 
is  clear  that  the  effect  of  w  at  each  step  is  to  insert  an  uncertainty 
into  the  value  of  x  at  the  next  step.  Thus,  if  th»  covariance  of 
at  Xj^  is  then  the  extrapolated  covariance  of  "* 

0  Xj^  is  simply 


(26) 


The  covariances  add  since  we  assume  w  to  be  white.  li.en  if 

from  Eq.  (26)  is  used  in  Eq.  (13)  for  the  updating  of 

all  other  equations  hold  with  no  modifications.  Equations  (26), 

(l4)  and  (13)  are  exactly  what  were  derived  in  Eq.  (2)  via  the 

* 

Wlener-KalnBn  approoch. 

Consequently,  we  have  established  the  exact  correspondence  hetween 
the  two  seemingly  different  approaches  to  estimation.  The  missing 
link  between  the  two  is  provided  hy  the  I^atrix  Inversion  LcEma. 


Kalman  assumed  an  even  more  general  modt-l  where  v  and  v  arc 
correlated.  But  the  co.i5»lication  can  be  reduced  to  a  case  with  no 
correlation,  as  also  was  shown  in  Ref.  2.  In  all  other  respects, 
the  notations  here  and  those  in  Ref.  2  are  the  same. 


16 


The  main  computational  contrihution  of  the  Wiener-Kalcan  method  Is 
the  relative  3iniplioity  in  the  calculation  or  the  covariance  raatrlces 
P,  ,  .  However,  the  original  derivation  of  the  variance  Eq.  (12) 
is  consiuerahly  more  involved  without  the  use  of  the  Leana.  Con¬ 
ceptually,  the  Wiener-Kalijian  approach  via  the  calculation  of  the 
condii ioml  prohability  distribution  of  the  unknown  parameters  is 
more  lieneral.  Once  the  conditional  probabilities  are  confuted, 
optliaai  estlnvates  for  other  criteria  can  be  calculated.  In  fact , 
it  has  been  pointed  out  (Ref.  2)  that  the  conlltlonal  mean  is 
optimal  for  many  other  criterion  limction'*  than  the  least  squares  one. 
Purthernore,  the  likelihood  approach  is  not  particularly  has^red 
by  the  presence  of  nonlinear  functions  in  the  case  of  least  squares 
ostimtes  with  gaussian  noise.  We  shall  attenq>t  to  discuss  some 
aspects  of  this  nonlinear  estimation  problem  in  another  MetnomMum. 
The  present  Memorandum  is  aimed  primarily  at  the  linear  case  where 
the  two  approaches  are  identical. 

It  is  also  clanr  that  the  astlaation  scheme  Eqs.  (26),  (13) 
and  (?>‘)  remain  valid  If  w  la  nonstationery,  l.e.,  Q  is  time-varying. 
Purtbermore,  the  Initial  estimate  and  its  covariance  P  must  be 
pro'/lded  as  part  of  the  stairtup  data.  Usually  we  assume  x^  *  0 
and  choose  the  best  guess  for  P^.  An  alternate  procedure  is  to 
moke  at  least  n  measurements  first  before  producing  an  estimate 
and  then  let  P^  =  (H» 
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IV.  wAum  vrm  ccnrrROL  problbi 


There  is  an  Interesting  irathenv\tl~Ai  slssUarlty  between  the 

*,ro’olem  discussed  above  and  xhe  vell-Hxtown  quadratic  error  control 

(3) 

problem  as  rii'st  dlscuverral  by  Kalwiu.  Tit  tbc  pre^rit  coutert 
this  80*called  duality  is  even  more  transparent.  Recall  that  in 
Ss«L2.c*n  11  VC  consjldor  “tlic  i>rol)Xcs  of* 

f  Minimize  the  error  J  *  v^l 


for  the  set  of  equations 


W  -  M 

Pi  fj 


(problem  A) 


Let  us  new  consider  the  so-called  dual  problem  of 


Minimize  J  >  u^  Uj^ 


[*.][] 

[vj  -H 


(Problem  b) 


In  Problem  (A)  we  have  more  equations  than  uhkncvns  and  one  tries 
to  minimize  the  square  error  in  the  resultant  set  of  equations.  In 
(B)  ve  have  irore  unknowns  than  eqttations  and  one  tries  to  minimize  the 
square  of  the  solution  to  the  equations. 

Assume  Uj.  has  m  coeqxinents  and  A^^  is  nxm  where  m  >  n,  then  it 
is  well-known  that  the  solution  to  (b)  is 


Row  if  r  more  free  components,  and  r  more  columns  arc  added  to  'y 
and  Aj^,  respectively,  then  wc  hi»ve. 
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Vi 


N 


(28) 


and  the  optimal  solution  to  (b)  is 

\+l  \+l  ^+1  W+1  ^+1  ^+1  \  ^  ° 

^  H:  ^\+l  ^+1  ^+1  *  \  ^ 


Tlie  additional  r  components  of  u.  ,  are  given  by  a  formula  very  similar 
to  the  fonnula  for  yielding  the  correction  (^)‘  The 

striking  thing  is,  of  course,  the  matrix  ^+1  ^+1  ^ 

If  we  define  pJJ  -  (Aj^  Aj*)"^ 

^k+1  “  ^*K  ^  *  \+l  \+l  ^+1^ 

^k+1  \  ■*■  \+l  \+l  ^+1 


then  we  have  via  the  Matrix  Inverse  Lemma 

^k+1  ”  ^k  '  ^k  fk+1  ^^+1  \  \+l  *  \+l^  \+l  ^k 

which  is  exactly  the  same  at,  2q,  (l2)  if  we  identify  with 
This  is  really  the  essence  of  duality. 

Problem  (b),  of  course,  can  be  given  a  control  Interpretation. 
Vector  c  plays  the  role  of  desired  terminal  state,  ecmiponents  of 


matrix  of  influence  coefficients  which  relates  the  effort  at  one 
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Instant  to  the  state  at  the  terminal  time.  The  following  more  or 
less  obvious  statements  coneeiulng  problems  (a)  ar.d  (s)  iMv+her  Il¬ 
lustrate  the  concepts  of  duality; 

(A)  (B) 

For  an  infinite  stage  estimation  For  an  '  'inite  step  control 

process  with  stable  dynamics,  we  process  with  stable  dynamics 

can  identify  the  true  r  with  in-  we  can  achieve  the  desired  stat 

creasing  precision.  with  decreasing  effort. 


V/hen  no  observation  has  been 

made  we  have  no  confidence  in 

our  estimates  P  =  w. 

o 

The  columns  of  must  be  line¬ 
arly  Iniependent  so  that 
(I^  !^)"^  exists.  This  is 

called  observability. 


When  there  are  no  control  steps 
left,  the  desired  state  must 
bo  achieved  with  infinite 

■K 

energy  =  ». 

Hie  rows  of  must  be  linear¬ 
ly  independent  so  that 
(Ajj  exists.  This 

is  called  controllability. 


The  extension  of  the  above  ideas  to  the  time-dependent  case  of 
Figs.  1  and  2  is  strai(^t forward.  In  fact,  it  is  well-known  that 
for  a  discrete  dynamic  syst'jra 

*k+l  =  ^\+l’ 


2 

Q 


the  optimal  control  sequence  is  given  by  (via  dynamic  programming 
see  Kefs.  3  and  6) 
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„  .  .  (H*  H  4  R)  ^  K  P^.1.1  » 


.1.1  ■  41-1  -  W  ®*  <®*'  '■k-i-i  ”*  *  ”*'  '"-‘-1 


^H-i/N- 


*  .■*•*  T,* 

^N-i  “  '*' 


*  n*' 

(»  +  r  Qi 


Equations  (34)  and  (33)  are  exactly  the  same  as  (26)  and  (13)  if 


identify 


with  r* 


1  «  B-1#  • • • #  ® 


and  hence,  from  this  conies  the  name  Duality  Principle. 
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Y.  APPUGATIOW 

To  Illustrate  the  application  of  least  squares  estimation  theory 
to  trajectory  determination,  we  consider  the  case  cf  a  24-hour  satel¬ 
lite  In  a  circular  orbit.  Vltu  passage  of  tiiae,  the  satellite  tends 
to  drift  out  of  the  orbit  due  to  various  causes  such  as  gravitational 
perturbations,  etc.  Noisy  observations  arc  made  on  the  satellite  from 
the  ground.  It  has  been  shown  (Eq.  (6))  that  under  various  linearity 
assunqrtlons,  the  following  relationship  is  true 

Aq  »  C  Ap  (35) 
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Ap  = 


Aa/a 

A(e  coo  E^) 
A(e  sin  E^) 

Av 

o 

Av 


“  °ij  *» 


The  definitions  of  the  variables  are: 
A 


h_  = 


azimuth  nnsle  of  the  satellite  if  it  is  exactly  in  the 
circular  orbiL 

elevation  angle  of  the  satellite  if  it  is  exactly  in  the 
circular  orbit 


AA.  =  measured  difference  between  the  observed  azimuth  and  A, 


Ah.  = 
1 

9  = 
X  = 

P  = 
Ap  = 


at  time  t^ 

measured  difference  between  the  observed  elevation  and 
h^,  at  time  t^ 

time  of  observation 

longitude  of  observation  point 

latitude  of  observation  point 

set  of  orbital  parameters 

charjge  of  the  orbital  parameters  frean  the  reference  values 
of  the  perfect  circular  orbit 


Since  the  elements  of  C  are  independent  of  the  observations  emd  ore 
calculable  for  all  time,  ve  can  numerically  solve  the  variance  equa¬ 
tion  beforehand.  If  we  define  the  successive  rows  of  C  as  then 


(36) 
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k+1 


*k  '^k+1  \+l  'k 


K  (c 


P  c*  + 
k+1  k-1  k+1 


a^) 


-1 


2 

Where  is  the  variance  of  the  obsei’vation  errors  in  azimuth  and 
elevation  for  the  kth  observation.  Etiuation  (3‘;>)  can  bo  aomputed 
beforehauid  for  as  many  steps  as  needed.  A  rou^  estimate  of  the  number 
of  steps  (i.e.,  observations)  that  are  needed  con  be  obtained  by  check¬ 
ing  the  size  of  Tr(P. ),  since  dVCP,.)  lo  the  oum  of  the  variance  of 
the  error  of  the  estimates  Ap.  The  initial  condition  for  Eq.  (36) 
can  be  taken  to  be  any  reasonable  guess  Ap^.  A  good  way  cf  estinvitliig 
P^  would  be  to  compute  a  few  extra  row*!  of  c,  sr>y,  ..., 

X 

c*  and  form  the  matrix  C  .  Then  we  can  lot 


"  (C*’ 


T 

a 


(37) 


When  we  have  finished  computing  PjJs  for  a]Jl  k,  then  the  estimate  in 
the  cliange  in  orbltaJ  parameters  is  simply  given  by 

=  %.l  \  °k  ^^‘^k  -  (  36) 

0 

where  Aqj^  is  the  difference  in  measured  and  conq)uted  azimuth  or 
elevation.  Equation  (38)  is  the  only  equation  that  needs  to  be  process¬ 
ed  in  real  time  us  observations  are  made. 

In  practice,  of  course,  he  abo/e  procedure  is  applicable  only 
when  the  tnus  change  Ap  is  small  sc  that  the  llnf'arlt.y  assunrotiens 
hold.  The  case  when  the  change  in  orbit  parameter  is  large  so  that 
Eq.  (35)  is  not  a  good  approxiretion  wiil  require  different  treatment. 

We  shall  discuss  this  in  a  later  K.wnorandum. 
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